71 research outputs found
Efficient Exact Inference in Planar Ising Models
We give polynomial-time algorithms for the exact computation of lowest-energy
(ground) states, worst margin violators, log partition functions, and marginal
edge probabilities in certain binary undirected graphical models. Our approach
provides an interesting alternative to the well-known graph cut paradigm in
that it does not impose any submodularity constraints; instead we require
planarity to establish a correspondence with perfect matchings (dimer
coverings) in an expanded dual graph. We implement a unified framework while
delegating complex but well-understood subproblems (planar embedding,
maximum-weight perfect matching) to established algorithms for which efficient
implementations are freely available. Unlike graph cut methods, we can perform
penalized maximum-likelihood as well as maximum-margin parameter estimation in
the associated conditional random fields (CRFs), and employ marginal posterior
probabilities as well as maximum a posteriori (MAP) states for prediction.
Maximum-margin CRF parameter estimation on image denoising and segmentation
problems shows our approach to be efficient and effective. A C++ implementation
is available from http://nic.schraudolph.org/isinf/Comment: Fixed a number of bugs in v1; added 10 pages of additional figures,
explanations, proofs, and experiment
Graph Kernels
We present a unified framework to study graph kernels, special cases of which include the random
walk (GƤrtner et al., 2003; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004;
MahƩ et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time
complexity of kernel computation between unlabeled graphs with n vertices from O(n^6) to O(n^3).
We find a spectral decomposition approach even more efficient when computing entire kernel matrices.
For labeled graphs we develop conjugate gradient and fixed-point methods that take O(dn^3)
time per iteration, where d is the size of the label set. By extending the necessary linear algebra to
Reproducing Kernel Hilbert Spaces (RKHS) we obtain the same result for d-dimensional edge kernels,
and O(n^4) in the infinite-dimensional case; on sparse graphs these algorithms only take O(n^2)
time per iteration in all cases. Experiments on graphs from bioinformatics and other application
domains show that these techniques can speed up computation of the kernel by an order of magnitude
or more. We also show that certain rational kernels (Cortes et al., 2002, 2003, 2004) when
specialized to graphs reduce to our random walk graph kernel. Finally, we relate our framework to
R-convolution kernels (Haussler, 1999) and provide a kernel that is close to the optimal assignment
kernel of Frƶhlich et al. (2006) yet provably positive semi-definite
Centering Neural Network Gradient Factors
It has long been known that neural networks can learn faster when their input and hidden unit activities are centered about zero; recently we have extended this approach to also encompass the centering of error signals [2]. Here we generalize this notion to all factors involved in the network's gradient, leading us to propose centering the slope of hidden unit activation functions as well. Slope centering removes the linear component of backpropagated error; this improves credit assignment in networks with shortcut connections. Benchmark results show that this can speed up learning significantly without adversely affecting the trained network's generalization ability
Accelerated gradient descent by factor-centering decomposition
Gradient factor centering is a new methodology for decomposing neural networks into biased and centered subnets which are then trained in parallel. The decomposition can be applied to any pattern-dependent factor in the networkās gradient, and is designed such that the subnets are more amenable to optimization by gradient descent than the original network: biased subnets because of their simplified architecture, centered subnets due to a modified gradient that improves conditioning. The architectural and algorithmic modifications mandated by this approach include both familiar and novel elements, often in prescribed combinations. The framework suggests for instance that shortcut connections ā a well-known architectural feature ā should work best in conjunction with slope centering, a new technique described herein. Our benchmark experiments bear out this prediction, and show that factorcentering decomposition can speed up learning significantly without adversely affecting the trained networkās generalization ability. 1
Local Gain Adaptation in Stochastic Gradient Descent
Gain adaptation algorithms for neural networks typically adjust learning rates by monitoring the correlation between successive gradients. Here we discuss the limitations of this approach, and develop an alternative by extending Sutton's work on linear systems to the general, nonlinear case. The resulting online algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical independence between successive training patterns, and do not require an arbitrary smoothing parameter. In our benchmark experiments, they consistently outperform other acceleration methods, and show remarkable robustness when faced with noni. i.d. sampling of the input space
- ā¦